On 
On 



in 



The length of a typical Huffman codeword* 

Riidiger Schack^ 

Department of Physics and Astronomy, University of New Mexico 

Albuquerque, NM 87131-1156 

January 11, 1993 



Abstract 



If pi (i = 1, . . . , N) is the probability of the i-th letter of a memoryless source, 
the length ij of the corresponding binary Huffman codeword can be very different 
from the value —logpi. For a typical letter, however, ij ~ — logpi. More pre- 
cisely, P~= ]T Vi < 2~ m and P+ = £ Pj < 2 - c ^ +2 

, je{i\h<- log pi-m} je{i\k>-\ogpi+m} 

£^ ■ where c ~ 2.27. 

m 

"aft Introduction 

Concepts from information theory gained new importance in physics |1|, [J when Bennett pj 
realized that Landauer's principle [§J, which specifies the unavoidable energy cost fcsTln2 
for the erasure of a bit of information, is the clue to the solution of the problem posed by 
Maxwell's demon. This problem can be summarized as follows: A demon knows initially 
that a system is in the i-th possible state (i = 1, . . . , N) with probability p^. The demon 
then finds the actual state state of the system — thereby lowering the system's entropy 
by the amount H = —J2Pi^°SPi- This is in apparent violation of the second law of 
thermodynamics, since the entropy decrease corresponds to a free-energy increase AF = 
H\ibT In 2 that can be extracted as work. Bennett solved this inconsistency by noting 
that in order to return to its original configuration the demon must erase its record of the 
system state. The second law is saved since, due to Shannon's noiseless coding theorem, the 
average length of the demon's record cannot be smaller than H. Therefore, the Landauer 
erasure cost cancels the extracted work on the average. 

If the demon wants to operate with maximum efficiency, it must use an optimal cod- 
ing procedure, i. e., Huffman coding |J. In this context, the question arises as to how 
the record length for the i-th. state can be interpreted. Zurek [|IJ discusses two alter- 
native (sub-optimal) coding procedures for the demon: minimal programs for a universal 
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computer, where the record length is the algorithmic complexity pi of the state; and 
Shannon-Fano coding, where the record length is determined by the state's probability 
through the inequality — log Pi < k < — log Pi + 1. The length of a Huffman codeword, 
on the other hand, is neither determined by the state's complexity nor by its probability. 
Given pi, the Huffman codeword length can, in principle, be as small as 1 bit and as large 
as [log(( v / 5 + l)/2)]" 1 « 1.44 times -logpj 0. 

In this correspondence, we show that the lengths of both Huffman and Shannon-Fano 
codewords have a similar interpretation. The probability of the states for which the Huff- 
man codeword length differs by more than m bits from — \ogpi decreases exponentially 
with m. In this sense, one can say that, for a typical state, the Huffman codeword satisfies 
Zj ~ —log pj, just as for Shannon-Fano coding. This is especially relevant in a thermo- 
dynamic context where entropies are of the order of 2 80 bits and where an error of a few 
hundred bits in the length of a typical record would be unnoticeable. 

Result 

In this section we return to the terminology of the abstract and consider a discrete mem- 
oryless iV-letter source (N > 2) to which a binary Huffman code is assigned. The i-th 
letter has probability Pi < 1 and codeword length Zj. The Huffman code can be represented 
by a binary tree having the sibling property || defined as follows: The number of links 
leading from the root of the tree to a node is called the level of that node. If the level-n 
node a is connected to the level- (n + 1) nodes b and c, then a is called the parent of b and 
c; a's children b and c are called siblings. There are exactly iV terminal nodes or leaves, 
each leaf corresponding to a letter. Each link connecting two nodes is labeled or 1. The 
sequence of labels encountered on the path from the root to a leaf is the codeword assigned 
to the corresponding letter. The codeword length of a letter is thus equal to the level of 
the corresponding leaf. Each node is assigned a probability such that the probability of 
a leaf is equal to the probability of the corresponding letter and the probability of each 
non-terminal node is equal to the sum of the probabilities of its children. A tree has the 
sibling property iff each node except the root has a sibling and the nodes can be listed 
in order of nonincreasing probability with each node being adjacent to its sibling in the 
list §. 

Definition: A level-/ node with probability p — or, equivalently, a letter with probability p 
and codeword length I — has the property (X~) iff I > — \ogp + m (I < — \ogp — m). 

Theorem 1: P~ = J2j e i-Pj < 2~ m where I~ = {i\k < — \ogpi — m}, i. e., the probability 
that a letter has property X~ is smaller than 2~ m . (This is true for any prefix-free code.) 

Proof: P~ = 2- m Y, jeIm 2 logp i +m < 2~ m £ je/m 2^ < 2-™. The last inequality follows 
from the Kraft inequality. 

Lemma: Any node with property X+ has probability p < 2~ c(m ~ 1 ) where c = (1 — logg) -1 — 
1 ss 2.27 with g = (y/E + l)/2. 

Proof: Property implies I > \_— log p+m\ where [^J denotes the largest integer less than 
or equal to x. It is shown in Ref. that, if p and I are the probability and level of a given 
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node, p > 1/F n implies / < n — 2 for n > 3 where F n = [g n — (—g) n ]/V5 > g n 2 is the n-th 
Fibonacci number (n > 1). Therefore, if |_— logp + mj > 1, the inequality I > |_— log p + m\ 
implies p < (F^ logp+m ^ +2 )~ l < #-L-i°gp+™J < g io gP - m +^ For ^_ logp + m j < i ; p < 
giogp-m+i ^Q^g trivially. Solving for p proves the lemma. 

Theorem 2: P+ = J2 je i+Pj < 2-< m -^ +2 where J+ = {i\k > -logp* + to}, i. e., the 
probability that a letter has property X+ is smaller than 2 -c ( m ~ 2 )+ 2 . 

Proof: Suppose there is at least one letter — and hence a corresponding leaf — having the 
property X+. Then, among all nodes having the property X+, there is a nonempty subset 
with minimum level Uq > 0. In this subset, there is a node having maximum probability 
p . In other words, there is no node having property X+ on a level n < n , and on level 
no, there is no node with probability p > po. Thus property implies 

p > 2- no+m . 

Now let ko be the number of nodes on level no — 1, and define the integer Z < n o such 
that 2 l ° < k < 2 k)+1 . Then the number of level-n nodes is less than 2 lo+2 . Since all nodes 
having property are on levels n > no, it follows that 

P+ < 2 l » +2 p . 

In order to turn this into a useful bound, note the following. The sibling property or, 
more directly, the optimality of a Huffman code implies that all level- (n — 1) nodes have 
probability p > p . Since there are at least 2 l ° level-(n — 1) nodes, it is again a consequence 
of the sibling property that there exists a level-(n — 1 — Z ) node with probability p\ > 
2 l °p > 2~ m+m+l ° and thus having property X^ n _ x . Using the lemma, one finds pi < 
2-c(m-2) an( j therefore 

P+ < 2 l » +2 p < 2 2 Pl < 2 - c (™- 2 )+ 2 . 
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